perf: implement batch processing in iterateEvalTree#406
Conversation
|
@seqbenchbot up main search-keyword-exact-match-warm |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## 329-batching-1 #406 +/- ##
==================================================
- Coverage 71.54% 70.58% -0.97%
==================================================
Files 220 221 +1
Lines 16568 20423 +3855
==================================================
+ Hits 11854 14415 +2561
- Misses 3840 5128 +1288
- Partials 874 880 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@seqbenchbot down e8eefca9 |
|
Nice, @cheb0 The benchmark with identificator Show summary
Have a great time! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| for _, lid := range lids { | ||
| rawLid := lid.Unpack() | ||
| blockIdx := p.table.GetIDBlockIndexByLID(rawLid) | ||
| if p.midCache.blockIndex != int(blockIdx) { |
There was a problem hiding this comment.
nit: fillMIDs has this check inside. did you add it to avoid function call?
| // Get MIDs | ||
| if needMids > 0 { | ||
| timerMID.Start() | ||
| mids = idsIndex.GetMIDs(lidsSlice[0:needMids], mids[:0]) |
There was a problem hiding this comment.
nit: technically we can omit the lower bound if it equals 0
lidsSlice[:needMids]| return seq.MID(p.midCache.GetValByLID(uint32(lid))), nil | ||
| } | ||
|
|
||
| func (p *Provider) MIDs(lids []node.LID, out []seq.MID) ([]seq.MID, error) { |
There was a problem hiding this comment.
Why Provider has method for retrieving a batch of MID but there is no similar method for RID?
| defer searchBuffersPool.Put(buffers) | ||
| mids := buffers.mids | ||
| rids := buffers.rids | ||
| lidsBuffer := buffers.lids |
There was a problem hiding this comment.
Shouldn't you reset buffers since slices are reused?
| lidsBuf := lidsBuf{ | ||
| lids: make([]node.LID, 0, consts.LIDBlockCap), | ||
| } | ||
| return searchBuffers{ |
There was a problem hiding this comment.
It's better to return a pointer here, otherwise there will be unnecessary allocations since any is returned.
| filterMIDs := sw.Timer("filter_mids") | ||
| updateHist := sw.Timer("update_hist") |
There was a problem hiding this comment.
| filterMIDs := sw.Timer("filter_mids") | |
| updateHist := sw.Timer("update_hist") | |
| timerFilterMIDs := sw.Timer("filter_mids") | |
| timerUpdateHist := sw.Timer("update_hist") |
| } | ||
|
|
||
| type LIDsIter interface { | ||
| Lids(out []node.LID) []node.LID |
There was a problem hiding this comment.
| LIDs(out []node.LID) []node.LID |
There was a problem hiding this comment.
I'll leave it here since it is out of scope of this diff.
Take a look at https://github.com/ozontech/seq-db/blob/329-batching-iterate-eval-tree/frac/sealed/lids/iterator_desc.go#L121-L131 -- I guess you've introduced code duplication while performing rebase.
| return total, ids, hist, aggs, nil | ||
| } | ||
|
|
||
| func filterOutOfRangeMIDs(params SearchParams, mids []seq.MID, lidsSlice []node.LID) ([]seq.MID, []node.LID) { |
There was a problem hiding this comment.
I am not sure what purpose this function serves.
Per my understanding, we cannot iterate over seq.LID which correspond to seq.ID that lie outside of user-requested range [from; to] -- this is guaranteed because we calculate minLID and maxLID in getLIDsBorders and use those in all iterators to set boundaries.
Am I missing something?
| buffers := searchBuffersPool.Get().(searchBuffers) | ||
| defer searchBuffersPool.Put(buffers) | ||
| mids := buffers.mids | ||
| rids := buffers.rids |
There was a problem hiding this comment.
Starting a petition to protect Vim users and their descendants — we require spaces. This is how we navigate code. Thank you for your cooperation.
Maybe something like?
var (
total int
lastID seq.ID
ids seq.IDSources
)
buffers := searchBuffersPool.Get().(searchBuffers)
defer searchBuffersPool.Put(buffers)| } | ||
| // limit how much we drain from eval tree for one-by-one flow. ignored for batched flow | ||
| need = min(need, maxLidsToDrain) | ||
| needLids = min(needLids, maxLidsToDrain) |
There was a problem hiding this comment.
Maybe we can move this whole thing with calculating limits/offsets/etc to the batch? I mean something like:
if ok {
evalTreeIter = func(need int, _ lidsBuf) LIDsIter {
// batched flow: juts get a batch and return
return batchNode.NextBatch().Trim(need)
// Or return batchNode.NextBatch(need)
}
} else {
...
}
func (b LIDBatch) Trim(k int) LIDBatch {
b.lids = b.lids[:min(k, len(b.lids))]
return b
}
Description
Continuation of #390
iterateEvalTreeworks with batches of lids, requests batches of mids and ridsget_midstepI did some measurements for both patches (this combined with #390) vs main (used bitpack encoding in both branches). For small ordinary searches there is no benefit. For dense analytic queries there is a decent improvement.
For our k6 benchmark
seq-db-hist.js:2.3 sec=>650 msFor
seq-db-aggs.js:6.1 sec=>4.7 secHist over
_all_(warm query) (3 prod fractions):~37 ms=>~15 msPart of #329